AITopics | subgradient method

Collaborating Authors

subgradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fast Zeroth-Order Convex Optimization with Quantum Gradient Methods

Neural Information Processing SystemsJun-18-2026, 18:57:15 GMT

We study quantum algorithms based on quantum (sub)gradient estimation using noisy function evaluation oracles, and demonstrate the first dimension-independent query complexities (up to poly-logarithmic factors) for zeroth-order convex optimization in both smooth and nonsmooth settings. Interestingly, only using noisy function evaluation oracles, we match the first-order query complexities of classical gradient descent, thereby exhibiting exponential separation between quantum and classical zeroth-order optimization. We then generalize these algorithms to work in non-Euclidean settings by using quantum (sub)gradient estimation to instantiate mirror descent and its variants, including dual averaging and mirror prox. By leveraging a connection between semidefinite programming and eigenvalue optimization, we use our quantum mirror descent method to give a new quantum algorithm for solving semidefinite programs, linear programs, and zero-sum games. We identify a parameter regime in which our zero-sum games algorithm is faster than any existing classical or quantum approach.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Sliced Inner Product Gromov-Wasserstein Distances

Gong, Xiaoyun, Rioux, Gabriel, Goldfeld, Ziv

arXiv.org Machine LearningMay-12-2026

The Gromov-Wasserstein (GW) problem provides a framework for aligning heterogeneous datasets by matching their intrinsic geometry, but its statistical and computational scaling remains an issue for high-dimensional problems. Slicing techniques offer an appealing route to scalability, but, unlike Wasserstein distances, GW problems do not generally admit closed-form solutions in one-dimension. We resolve this problem for the GW problem with inner product cost (IGW), propose a sliced IGW distance that enjoys a natural rotational invariance property, and comprehensively study its structural and computational properties. Numerical experiments validating our theory are presented, followed by applications to heterogeneous clustering of text data and language model representation comparison.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.08546

Country: North America > United States (0.92)

Genre: Research Report (0.63)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Low-RankExtragradientMethodforNonsmoothand Low-RankMatrixOptimizationProblems

Neural Information Processing SystemsFeb-11-2026, 12:07:47 GMT

We give a precise trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method.

artificial intelligence, machine learning, optimal solution, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

A Further Related Work on Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsFeb-11-2026, 05:11:06 GMT

To appreciate the difficulty and the broad scope of the research agenda in nonsmooth nonconvex optimization, we start by describing the existing relevant literature. First, the existing work is mostly devoted to establishing the asymptotic convergence properties of various optimization algorithms, including gradient sampling (GS) methods [16-18, 57, 19], bundle methods [56, 40] and subgradient methods [8, 65, 30, 28, 12]. More specifically, Burke et al. [16] provided a systematic investigation of approximating the Clarke subdifferential through random sampling and proposed a gradient bundle method [17]--the precursor of GS methods--for optimizing a nonconvex, nonsmooth and non-Lipschitz function. Later, Burke et al. [18] and Kiwiel [57] proposed the GS methods by incorporating key modifications into the algorithmic scheme in Burke et al. [17] and proved that every cluster point of the iterates generated by GS methods is a Clarke stationary point. For an overview of GS methods, we refer to Burke et al. [19].

artificial intelligence, inequality, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

Oikonomou, Dimitris, Loizou, Nicolas

arXiv.org Machine LearningDec-22-2025

The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. On non-smooth convex benchmarks, our experiments are consistent with the theoretical predictions on how the safeguard affects the convergence neighborhood. On deep neural networks the proposed step size achieves competitive performance to existing adaptive baselines and exhibits stable behavior across a wide range of problem settings. Moreover, in these experiments, the gradient norms under our step size do not collapse to (near) zero, indicating robustness to vanishing gradients.

polyak step size, step size, theorem 3, (14 more...)

arXiv.org Machine Learning

2512.02342

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Nonmonotone subgradient methods based on a local descent lemma

Aragón-Artacho, Francisco J., Campoy, Rubén, Pérez-Aros, Pedro, Torregrosa-Belén, David

arXiv.org Artificial IntelligenceOct-23-2025

The aim of this paper is to extend the context of nonmonotone descent methods to the class of nonsmooth and nonconvex functions called upper-$\mathcal{C}^2$, which satisfy a nonsmooth and local version of the descent lemma. Under this assumption, we propose a general subgradient method that performs a nonmonotone linesearch, and we prove subsequential convergence to a stationary point of the optimization problem. Our approach allows us to cover the setting of various subgradient algorithms, including Newton and quasi-Newton methods. In addition, we propose a specification of the general scheme, named Self-adaptive Nonmonotone Subgradient Method (SNSM), which automatically updates the parameters of the linesearch. Particular attention is paid to the minimum sum-of-squares clustering problem, for which we provide a concrete implementation of SNSM. We conclude with some numerical experiments where we exhibit the advantages of SNSM in comparison with some known algorithms.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.19341

Country:

Europe > Spain (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Minimisation of Submodular Functions Using Gaussian Zeroth-Order Random Oracles

Farzin, Amir Ali, Pun, Yuen-Man, Braun, Philipp, Summers, Tyler, Shames, Iman

arXiv.org Artificial IntelligenceOct-20-2025

We consider the minimisation problem of submodular functions and investigate the application of a zeroth-order method to this problem. The method is based on exploiting a Gaussian smoothing random oracle to estimate the smoothed function gradient. We prove the convergence of the algorithm to a global $ε$-approximate solution in the offline case and show that the algorithm is Hannan-consistent in the online case with respect to static regret. Moreover, we show that the algorithm achieves $O(\sqrt{NP_N^\ast})$ dynamic regret, where $N$ is the number of iterations and $P_N^\ast$ is the path length. The complexity analysis and hyperparameter selection are presented for all the cases. The theoretical results are illustrated via numerical examples.

artificial intelligence, lov asz extension, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.15257

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

Díaz, Mateo, Jiang, Liwei, Labassi, Abdel Ghani

arXiv.org Artificial IntelligenceOct-6-2025

Composite optimization problems involve minimizing the composition of a smooth map with a convex function. Such objectives arise in numerous data science and signal processing applications, including phase retrieval, blind deconvolution, and collaborative filtering. The subgradient method achieves local linear convergence when the composite loss is well-conditioned. However, if the smooth map is, in a certain sense, ill-conditioned or overparameterized, the subgradient method exhibits much slower sublinear convergence even when the convex function is well-conditioned. To overcome this limitation, we introduce a Levenberg-Morrison-Marquardt subgradient method that converges linearly under mild regularity conditions at a rate determined solely by the convex function. Further, we demonstrate that these regularity conditions hold for several problems of practical interest, including square-variable formulations, matrix sensing, and tensor factorization. Numerical experiments illustrate the benefits of our method.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2509.11486

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

New Insights and Algorithms for Optimal Diagonal Preconditioning

Ghadimi, Saeed, Jung, Woosuk L., Sujanani, Arnesh, Torregrosa-Belén, David, Wolkowicz, Henry

arXiv.org Artificial IntelligenceSep-30-2025

Preconditioning (scaling) is essential in many areas of mathematics, and in particular in optimization. In this work, we study the problem of finding an optimal diagonal preconditioner. We focus on minimizing two different notions of condition number: the classical, worst-case type, $κ$-condition number, and the more averaging motivated $ω$-condition number. We provide affine based pseudoconvex reformulations of both optimization problems. The advantage of our formulations is that the gradient of the objective is inexpensive to compute and the optimization variable is just an $n\times 1$ vector. We also provide elegant characterizations of the optimality conditions of both problems. We develop a competitive subgradient method, with convergence guarantees, for $κ$-optimal diagonal preconditioning that scales much better and is more efficient than existing SDP-based approaches. We also show that the preconditioners found by our subgradient method leads to better PCG performance for solving linear systems than other approaches. Finally, we show the interesting phenomenon that we can apply the $ω$-optimal preconditioner to the exact $κ$-optimally diagonally preconditioned matrix $A$ and get consistent, significantly improved convergence results for PCG methods.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2509.23439

Country: North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Solving Constrained Stochastic Shortest Path Problems with Scalarisation

Schmalz, Johannes, Trevizan, Felipe

arXiv.org Artificial IntelligenceAug-26-2025

Constrained Stochastic Shortest Path Problems (CSSPs) model problems with probabilistic effects, where a primary cost is min-imised subject to constraints over secondary costs, e.g., minimise time subject to monetary budget. Current heuristic search algorithms for CSSPs solve a sequence of increasingly larger CSSPs as linear programs until an optimal solution for the original CSSP is found. In this paper, we introduce a novel algorithm CARL, which solves a series of unconstrained Stochastic Shortest Path Problems (SSPs) with efficient heuristic search algorithms. These SSP subproblems are constructed with scalarisations that project the CSSP's vector of primary and secondary costs onto a scalar cost. CARL finds a maximising scalarisation using an optimisation algorithm similar to the subgradient method which, together with the solution to its associated SSP, yields a set of policies that are combined into an optimal policy for the CSSP . Our experiments show that CARL solves 50% more problems than the state-of-the-art on existing benchmarks.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.17446

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback